Spontaneous instrumental avoidance learning in social contexts

Adaptation to our social environment requires learning how to avoid potentially harmful situations, such as encounters with aggressive individuals. Threatening facial expressions can evoke automatic stimulus-driven reactions, but whether their aversive motivational value suffices to drive instrumental active avoidance remains unclear. When asked to freely choose between different action alternatives, participants spontaneously—without instruction or monetary reward—developed a preference for choices that maximized the probability of avoiding angry individuals (sitting away from them in a waiting room). Most participants showed clear behavioral signs of instrumental learning, even in the absence of an explicit avoidance strategy. Inter-individual variability in learning depended on participants’ subjective evaluations and sensitivity to threat approach feedback. Counterfactual learning best accounted for avoidance behaviors, especially in participants who developed an explicit avoidance strategy. Our results demonstrate that implicit defensive behaviors in social contexts are likely the product of several learning processes, including instrumental learning.


Statistical analyses
We run the same mixed logistic models (including the effect of strategy) as in the main experiment.
In the absence of the subjective evaluation task, contrary to the main task, we could neither run the model on the subjective evaluation data, nor the one on the probability of response repetition predicted by feedback's subjective value. Results can be found in Table S3 and data visualization in Figure S2.

Subjects' debriefing: list of questions
After reinforcement learning task 14. Have you ever been diagnosed with a neurological or psychiatric disease? 15. We are facing a global health crisis. Do you think this affected the way you performed the task?
If yes, describe in few words how. Figure S 1. Results from the subjective evaluation task. Left: results for the whole sample. Right: results split by strategy with the group without an explicit avoidance strategy in blue and the one with an explicit strategy in violet. Bars' height and black points represent the mean. Error-bars represent confidence intervals 95% for the normal distribution. Shaded points represent single subjects' means.

Figure S 2. Summary of behavioral results for the pilot study.
Left: mean proportion of hits throughout the task for the group without an explicit avoidance strategy in blue and the group with an explicit strategy in violet (the same color code applies for the remaining sections of the figure). Red points represent the mean and error-bars represent the confidence intervals at 95% for the normal distribution. Shaded points represent single subjects' means and grey's tone reflects whether, within each subject, the binomial test against chance (0.5) is significant (dark grey) or not (light grey). Right top: mean proportion of hits across the first 20 trials over blocks of stable action-outcome contingency (trial 1 = reversal trial). Points represent means within trial and error-bars represent confidence intervals at 95% for the normal distribution. The fitted curves represent the best fit (and 95% confidence interval) for the same hyperbolic function used in the mixed linear models (see Methods). Right bottom: mean proportion of action repetition following either an approach or an avoidance feedback. Black points represent means and error-bars represent confidence intervals 95% for the normal distribution. Shaded points represent single subjects' means.

Figure S 4. Results of RL models in which we used as reward value (R) the subjective value estimate provided
by the subjects in the subjective evaluation task. As for the GLM (model 4, see Methods), at each trial we entered in the RL model the subjective value obtained in the subjective evaluation task which corresponded to the real feedback obtained at each trial of the reinforcement learning task. Values were scaled from 0 to 1. All information provided in the legend for Fig. 3 in the paper applies to Fig. S4. The figure highlights the differences between the group without an explicit avoidance strategy (light blue) and the group with an explicit strategy (violet). Prediction from the simple and the counterfactual reinforcement learning models are in blue and turquoise, respectively (solid lines for the group without an explicit strategy and dotted lines for the one with explicit strategy). Blue and turquoise points represent means of simulations, and the fitted curve in the left top graph represents the best fit (and 95% confidence interval) for the same hyperbolic function used in the mixed linear models, fitted on simulated data. Right bottom, correlation between real and simulated mean hit proportions for the simple and the simulated models, as a function of the presence (violet) or the absence (light blue) of an explicit avoidance strategy.

Figure S 5 GLM results in seemingly non-learners in the mean hit proportion.
Most subjects (n = 145, 119 without explicit strategy, 26 with explicit strategy) had a non-significant binomial test against chance level on the mean proportion of hits throughout the task (Figure 2, top right). To investigate whether signs of learning emerge also in this sub-sample of seemingly non-learners, we plotted the proportion of repetition for the main study, as a function of the objective feedback at t -1 (top) or of the subjective value attributed to this feedback (bottom). See Table S6 for statistical details.  It did at times. Especially if I got someone who reacted particularly negatively. This tended to be more with the women. One might as well chance ones arm on the other side! The blokes I tended to just shrug off. On other occasions I just ignored it and sat where I wanted. I am exhausted atm so this was for most of the survey.